This is one page of the R Handbook for Epidemiologists, but is being printed as a stand-alone page.
You can find the complete handbook on Github
Spatial aspects of your data can provide a lot of insights into the situation of the outbreak to answer questions such as:
In this section, we will explore basic spatial data visualization methods using tmap and ggplot2 packages. We will also walk through some of the basic spatial data management and querying methods with the sf package.
Choropleth map
Density heatmap
Health facility catchment area
Load packages
First, load the packages required for this analysis:
pacman::p_load(rio, # to import data
here, # to locate files
tidyverse, # to clean, handle, and plot the data (includes ggplot2 package)
sf, # to manage spatial data using a Simple Feature format
tmap,# to produce simple maps, works for both interactive and static maps
janitor, # to clean column names
OpenStreetMap # to add OSM basemap in ggplot map
) Sample case data
# import aggregated case counts of disease X
linelist <- rio::import(here::here("data", "linelist_cleaned.rds"))
linelist <- linelist[sample(nrow(linelist), 1000),]
# Create sf object
linelist_sf <-
linelist %>%
st_as_sf(coords = c("lon", "lat"), crs = 4326)Sierra Leone: Admin boundary shapefiles
Data downloaded from HDX: https://data.humdata.org/dataset/sierra-leone-all-ad-min-level-boundaries
# ADM3 level
sle_adm3 <-
sf::read_sf(here::here("data/shp", "sle_adm3.shp")) %>% janitor::clean_names() %>%
filter(admin2name %in% c("Western Area Urban", "Western Area Rural"))Sierra Leone: Population by ADM3
Data downloaded from HDX: https://data.humdata.org/dataset/sierra-leone-population
# Population by ADM3
sle_adm3_pop <-
read.csv(here::here("data/population", "sle_admpop_adm3_2020.csv")) %>% janitor::clean_names()Sierra Leone: Health facility data from OpenStreetMap
Data downloaded from HDX: https://data.humdata.org/dataset/hotosm_sierra_leone_health_facilities
The easiest way to plot the XY coordinates (points) is to draw a map directly from the sf object which we created in the preparation section.
tmap offers simple mapping capabilities for both static (plot mode) and interactive (view mode) with just a few lines of codes.
This blog provides a good comparison among different mapping options in R. https://rstudio-pubs-static.s3.amazonaws.com/324400_69a673183ba449e9af4011b1eeb456b9.html
tmap_mode("plot") # or "plot"
#tm_shape(sle_adm3, bbox = st_bbox(linelist_sf)) +
tm_shape(sle_adm3, bbox = c(-13.3,8.43, -13.2,8.5)) +
tm_polygons(col = "#F7F7F7") +
tm_borders(col = "#000000", lwd = 2) +
tm_text("admin3name") +
tm_shape(linelist_sf) + tm_dots(size=0.08, col='blue') Choropleth maps can be useful to visualize your data by pre-defined area usually by administrative unit or health area for outbreak response to be able to target resources for specific area high incidence rates for example.
The current linelist data does not contain any information about the administrative units. Although it is ideal to store such information during the initial data collection phase, we can also assign administrative units to individual cases based on their spatial relationships (i.e. point intersects with a polygon).
sf package offers various methods for spatial joins. See more documentation about the st_join method and spatial join types here: https://r-spatial.github.io/sf/reference/geos_binary_pred.html
Spatial assign administrative units to cases First spatially intersect our case locations (points) with the ADM3 boundaries (polygons)
linelist_adm <-
linelist_sf %>%
sf::st_join(sle_adm3, join = st_intersects) %>%
select(names(linelist_sf), admin3name, admin3pcod)
# Now you will see the ADM3 names attached to each case
linelist_adm %>% select(case_id, admin3name)
## Simple feature collection with 1000 features and 2 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: -13.27125 ymin: 8.447961 xmax: -13.20589 ymax: 8.490183
## geographic CRS: WGS 84
## First 10 features:
## case_id admin3name geometry
## 5764 29a821 Mountain Rural POINT (-13.20736 8.452104)
## 2249 8188cd West III POINT (-13.2612 8.463342)
## 1441 d5e973 East I POINT (-13.21625 8.486004)
## 1207 d7d495 East II POINT (-13.21847 8.482719)
## 2463 946a96 East II POINT (-13.22173 8.482117)
## 2509 97ea41 East I POINT (-13.21563 8.488302)
## 509 ebe4df Mountain Rural POINT (-13.20866 8.452004)
## 4401 44c12d Mountain Rural POINT (-13.20877 8.461204)
## 659 7947aa West II POINT (-13.25079 8.473695)
## 1765 e972f8 West II POINT (-13.23298 8.462652)Case counts by ADM3
case_adm3 <-
linelist_adm %>% as_tibble() %>%
#filter(!is.na(admin3pcod)) %>%
group_by(admin3pcod, admin3name) %>%
summarise(cases = n()) %>%
arrange(desc(cases))
case_adm3
## # A tibble: 10 x 3
## # Groups: admin3pcod [10]
## admin3pcod admin3name cases
## <chr> <chr> <int>
## 1 SL040102 Mountain Rural 287
## 2 SL040208 West III 238
## 3 SL040207 West II 178
## 4 SL040204 East II 110
## 5 SL040203 East I 56
## 6 SL040201 Central I 52
## 7 SL040206 West I 28
## 8 SL040205 East III 27
## 9 SL040202 Central II 19
## 10 <NA> <NA> 5Choropleth mapping Now that we have the administrative unit names assigned to all cases, we can start mapping the case counts by area (choropleth maps).
Since we also have population data by ADM3, we can add this information to the case_adm3 table created previously.
# Add population data and calculate cases per 10K population
case_adm3 <-
case_adm3 %>%
left_join(sle_adm3_pop, by=c("admin3pcod"="adm3_pcode")) %>%
select(names(case_adm3), total) %>%
mutate(case_10kpop = round(cases/total * 10000, 3))
case_adm3
## # A tibble: 10 x 5
## # Groups: admin3pcod [10]
## admin3pcod admin3name cases total case_10kpop
## <chr> <chr> <int> <int> <dbl>
## 1 SL040102 Mountain Rural 287 33993 84.4
## 2 SL040208 West III 238 210252 11.3
## 3 SL040207 West II 178 145109 12.3
## 4 SL040204 East II 110 99821 11.0
## 5 SL040203 East I 56 68284 8.20
## 6 SL040201 Central I 52 69683 7.46
## 7 SL040206 West I 28 60186 4.65
## 8 SL040205 East III 27 500134 0.54
## 9 SL040202 Central II 19 23874 7.96
## 10 <NA> <NA> 5 NA NAJoin this table with the ADM3 polygons for mapping
# Add population data and calculate cases per 10K population
case_adm3_sf <-
case_adm3 %>%
left_join(sle_adm3, by="admin3pcod") %>%
select(objectid, admin3pcod, admin3name=admin3name.x, admin2name, admin1name,
cases, total, case_10kpop, geometry) %>%
st_as_sf()Mapping the results
# Number of cases
tmap_mode("plot")
tm_shape(case_adm3_sf) +
tm_polygons("cases") +
tm_text("admin3name")# Cases per 10K population
tmap_mode("plot")
tm_shape(case_adm3_sf) +
tm_polygons("case_10kpop",
breaks=c(0, 10, 50, 100),
palette = "Purples"
) +
tm_text("admin3name")We can also look at the combination of time and space by facetting the heatmaps.
Set parameters for the basemap using the OpenStreetMap package.
# Fit basemap by range of lat/long coordinates. Choose tile type
map <- openmap(c(max(linelist$lat, na.rm=T), max(linelist$lon, na.rm=T)), # limits of tile
c(min(linelist$lat, na.rm=T), min(linelist$lon, na.rm=T)),
zoom = NULL,
type = c("osm", "stamen-toner", "stamen-terrain","stamen-watercolor", "esri","esri-topo")[1],
mergeTiles = TRUE)
# Projection WGS84
map.latlon <- openproj(map, projection = "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs")Heatmap by month of onset
# Extract month of onset
linelist$date_onset_ym <- format(linelist$date_onset, "%Y-%m")
# Simply facet above map by month of onset
# Plot map. Must be autoplotted to work with ggplot
OpenStreetMap::autoplot.OpenStreetMap(map.latlon)+
# Density tiles
ggplot2::stat_density_2d(aes(x = lon,
y = lat,
fill = ..level..,
alpha=..level..),
bins = 10,
geom = "polygon",
contour_var = "count",
data = linelist %>% filter(date_onset>='2014-08-01' & date_onset<='2015-01-31'),
show.legend = F) +
#scale_fill_gradient(low = "black", high = "red")+
labs(x = "Longitude",
y = "Latitude",
title = "Distribution of simulated cases by month of onset") +
facet_wrap(~ date_onset_ym, ncol = 3)It might be useful to know where the health facilities are located in relation to the disease hot spots.
Finding the nearest health facility We can use the st_nearest_feature method from the sf package to assign the cloest health facility to individual cases.
# Closet health facility to each case
linelist_sf_hf <-
linelist_sf %>%
st_join(sle_hf, join = st_nearest_feature) %>%
select(case_id, osm_id, name, amenity)We can see that “Den Clinic” is the closest health facility for about ~30% of the cases.
# Group cases by health facility
hf_catchment <-
linelist_sf_hf %>% as.data.frame() %>%
group_by(name) %>%
summarise(case_n = n()) %>%
arrange(desc(case_n))
hf_catchment
## # A tibble: 8 x 2
## name case_n
## <chr> <int>
## 1 Den Clinic 361
## 2 Shriners Hospitals for Children 314
## 3 GINER HALL COMMUNITY HOSPITAL 184
## 4 panasonic 51
## 5 Princess Christian Maternity Hospital 34
## 6 ARAB EGYPT CLINIC 29
## 7 <NA> 14
## 8 MABELL HEALTH CENTER 13Visualizing the results on the map
tmap_mode("view")
tm_shape(linelist_sf_hf) + tm_dots(size=0.08, col='name') +
tm_shape(sle_hf) + tm_dots(size=0.3, col='red') + tm_text("name") +
tm_view(set.view = c(-13.2284,8.4699, 13), set.zoom.limits = c(13,14))Cases within 30 mins Walking distance from the closest health facility
We can also explore how many cases are located within 2.5km (~30 mins) walking distance from the closest health facility.
Note: For more accurate distance calculations, it is better to re-project your sf object to the respective local map projection system such as UTM (Earth projected onto a planar surface). In this example, for simplicity we will stick to the World Geodetic System (WGS84) Geograhpic coordinate system (Earth represented in a spherical / round surface, therefore the units are in decimal degrees). We will use a general conversion of: 1 decimal degree = ~111km.
See more information about map projections and coordinate systems: https://www.esri.com/arcgis-blog/products/arcgis-pro/mapping/gcs_vs_pcs/
First create a circular buffer with a radius of ~2.5km aroudn each health facility
Intersect this with the cases
# Intersect the cases with the buffers
linelist_sf_hf_2k <-
linelist_sf_hf %>%
st_join(sle_hf_2k, join = st_intersects, left = TRUE) %>%
filter(osm_id.x==osm_id.y | is.na(osm_id.y)) %>%
select(case_id, osm_id.x, name.x, amenity.x, osm_id.y)Count the results
202 out of 1000 cases (20.2%, shown in red dots in the map below) live more than 30 mins away from the nearest health facility)
nrow(linelist_sf_hf_2k)
## [1] 1000
nrow(linelist_sf_hf_2k[is.na(linelist_sf_hf_2k$osm_id.y),])
## [1] 215Visualize the results
tmap_mode("view")
tm_shape(linelist_sf_hf) + tm_dots(size=0.08, col='name') +
tm_shape(sle_hf_2k) + tm_borders(col = "red", lwd = 2) +
tm_shape(linelist_sf_hf_2k[is.na(linelist_sf_hf_2k$osm_id.y),]) +tm_dots(size=0.1, col='red') +
tm_view(set.view = c(-13.2284,8.4699, 13), set.zoom.limits = c(13,14))R Simple Features and sf package https://cran.r-project.org/web/packages/sf/vignettes/sf1.html
R tmap package https://cran.r-project.org/web/packages/tmap/vignettes/tmap-getstarted.html
ggmap: Spatial Visualization with ggplot2 https://journal.r-project.org/archive/2013-1/kahle-wickham.pdf